Evaluating Variations on the STAR Algorithm for Relative Efficiency and Sample Sizes Needed to Reconstruct Species Trees
نویسنده
چکیده
Many methods for inferring species trees from gene trees have been developed when incongruence among gene trees is due to incomplete lineage sorting. A method called STAR (Liu et al, 2009), assigns values to nodes in gene trees based only on topological information and uses the average value of the most recent common ancestor node for each pair of taxa to construct a distance matrix which is then used for clustering taxa into a tree. This method is very efficient computationally, scaling linearly in the number of loci and quadratically in the number of taxa, and in simulations has shown to be highly accurate for moderate to large numbers of loci as well as robust to molecular clock violations and misestimation of gene trees from sequence data. The method is based on a particular choice of numbering nodes in the gene trees; however, other choices for numbering nodes in gene trees can also lead to consistent inference of the species tree. Here, expected values and variances for average pairwise distances and differences between average pairwise distances in the distance matrix constructed by the STAR algorithm are used to analytically evaluate efficiency of different numbering schemes that are variations on the original STAR numbering for small trees.
منابع مشابه
Estimation of Tree Biomass at Individual tree, Sample plot and Hybrid Level using Drone Images
Two-dimensional image conversion algorithms to 3D data create the hope that the structural properties of trees can be extracted through these images. In this study, the accuracy of biomass estimation in tree, plot, and hybrid levels using UAVs images was investigated. In 34.8 ha of Sisangan Forest Park, using a quadcopter, 854 images from an altitude of 100 meters above ground were acquired. SF...
متن کاملInstance sampling in credit scoring: An empirical study of sample size and balancing
To date, best practice in sampling credit applicants has been established based largely on expert opinion, which generally recommends that small samples of 1500 instances each of both goods and bads are sufficient, and that the heavily biased datasets observed should be balanced by undersampling themajority class. Consequently, the topics of sample sizes and sample balance have not been subject...
متن کاملMarket Adaptive Control Function Optimization in Continuous Cover Forest Management
Economically optimal management of a continuous cover forest is considered here. Initially, there is a large number of trees of different sizes and the forest may contain several species. We want to optimize the harvest decisions over time, using continuous cover forestry, which is denoted by CCF. We maximize our objective function, the expected present value, with consideration of stochastic p...
متن کاملDetection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms
acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...
متن کاملInvestigating the Structure of Beech Stands in the Gap Making Phase (Case study: Asalem Forests, Guilan)
Forest structure consider the spatial arrangement of trees characteristics such as age, size, species, gender and so on is.This study aimed to investigate the structural diversity of three one-hectare stands in the gap making phase, were studied. For this purpose, three sample plots with a one hectare area were selected in Asalem beech stands which belonged to the structural features of the gap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2013